Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

نویسندگان

  • M. M. Homayounpour Computer Engineering and IT Department, Amirkabir University of Technology, Tehran, Iran
چکیده مقاله:

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Most previous research attempted to improve training phase such as training algorithms, different types of network, network architecture, feature type, etc. But in this study, we focus on test phase which is related to generate phoneme sequence that is also essential to achieve good phoneme recognition accuracy. Past research used Viterbi algorithm on hidden Markov model (HMM) to generate phoneme sequences. We address an important problem associated with this method. To deal with the problem of considering geometric distribution of state duration in HMM, we use real duration probability distribution for each phoneme with the aid of hidden semi-Markov model (HSMM). We also represent each phoneme with only one state to simply use phonemes duration information in HSMM. Furthermore, we investigate the performance of a post-processing method, which corrects the phoneme sequence obtained from the neural network, based on our knowledge about phonemes. The experimental results using the Persian FarsDat corpus show that using extended Viterbi algorithm on HSMM achieves phoneme recognition accuracy improvements of 2.68% and 0.56% over conventional methods using Gaussian mixture model-hidden Markov models (GMM-HMMs) and Viterbi on HMM, respectively. The post-processing method also increases the accuracy compared to before its application.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Articulatory Feature and Phoneme Recognition Using Multitask Learning

Speech sounds can be characterized by articulatory features. Articulatory features are typically estimated using a set of multilayer perceptrons (MLPs), i.e., a separate MLP is trained for each articulatory feature. In this paper, we investigate multitask learning (MTL) approach for joint estimation of articulatory features with and without phoneme classification as subtask. Our studies show th...

متن کامل

Discriminative kernel-based phoneme sequence recognition

We describe a new method for phoneme sequence recognition given a speech utterance, which is not based on the HMM. In contrast to HMM-based approaches, our method uses a discriminative kernel-based training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence p...

متن کامل

Phoneme recognition using acoustic events

This paper presents a new approach to phoneme recognition using nonsequential sub{phoneme units. These units are called acoustic events and are phonologically meaningful as well as recognizable from speech signals. Acoustic events form a phonologically incomplete representation as compared to distinctive features. This problem may partly be overcome by incorporating phonological constraints. Cu...

متن کامل

End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features. Recent advances in “deep learning” approaches questioned such systems, but while some attempts were made with simpler features such as spectrograms, stateof-the-art systems still rely on MFCCs. This might be viewed as a kind of failure f...

متن کامل

Phoneme recognition using time-delay neural networks

In this paper we present a Time-Delay Neural Network (TDNN) approach to phoneme recognition which is characterized by two important properties. 1) Using a 3 layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces. The TDNN learns these decision surfaces automatically using error backpropagation 111. 2) Th...

متن کامل

Speaker recognition using phoneme-specific GMMs

This paper compares three approaches to building phoneme-specific Gaussian mixture model (GMM) speaker recognition systems on the NIST 2003 Extended Data Evaluation to a baseline GMM system covering all of the phonemes. The individual performance of any given phoneme-specific GMM system falls below the performance of the baseline GMM, but fusing the top 40 performing scores of the individual ph...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 7  شماره 1

صفحات  137- 147

تاریخ انتشار 2019-03-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023